Dataset info
| Number of variables | 8 |
|---|---|
| Number of observations | 10000000 |
| Missing cells | 9981283 (12.5%) |
| Duplicate rows | 221295 (2.2%) |
| Total size in memory | 610.4 MiB |
| Average record size in memory | 64.0 B |
Variables types
| Numeric | 4 |
|---|---|
| Categorical | 2 |
| Boolean | 1 |
| Date | 0 |
| URL | 0 |
| Text (Unique) | 0 |
| Rejected | 1 |
| Unsupported | 0 |
Warnings
| Dataset has 221295 (2.2%) duplicate rows | Warning |
attributed_time has a high cardinality: 15699 distinct values | Warning |
attributed_time has 9981283 (99.8%) missing values | Missing |
click_time only contains datetime values, but is categorical. Consider applying pd.to_datetime() | Type |
click_time has a high cardinality: 29943 distinct values | Warning |
os is highly correlated with device (ρ = 0.9682952226) | Rejected |
app
Numeric
| Distinct count | 332 |
|---|---|
| Unique (%) | < 0.1% |
| Missing (%) | 0.0% |
| Missing (n) | 0 |
| Infinite (%) | 0.0% |
| Infinite (n) | 0 |
| Mean | 12.8596382 |
|---|---|
| Minimum | 0 |
| Maximum | 675 |
| Zeros (%) | < 0.1% |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 2 |
| Q1 | 3 |
| Median | 12 |
| Q3 | 15 |
| 95-th percentile | 26 |
| Maximum | 675 |
| Range | 675 |
| Interquartile range | 12 |
Descriptive statistics
| Standard deviation | 16.52679822 |
|---|---|
| Coef of variation | 1.285168211 |
| Kurtosis | 256.4815165 |
| Mean | 12.8596382 |
| MAD | 7.480779562 |
| Skewness | 11.80146715 |
| Sum | 128596382 |
| Variance | 273.1350595 |
| Memory size | 76.3 MiB |
Histogram with fixed size bins (bins=50)
Histogram with variable size bins (bins=[0.000e+00 5.000e-01 1.500e+00 2.500e+00 3.500e+00 ... 5.555e+02 5.565e+02 5.620e+02 5.635e+02 6.750e+02], "bayesian blocks" binning strategy used)
| Value | Count | Frequency (%) | |
| 12 | 1291185 | 12.9% | |
| 2 | 1202534 | 12.0% | |
| 15 | 1181585 | 11.8% | |
| 3 | 1170412 | 11.7% | |
| 9 | 966839 | 9.7% | |
| 18 | 917820 | 9.2% | |
| 14 | 507491 | 5.1% | |
| 1 | 391508 | 3.9% | |
| 8 | 364361 | 3.6% | |
| 21 | 223823 | 2.2% | |
| Other values (322) | 1782442 | 17.8% |
Minimum 5 values
| Value | Count | Frequency (%) | |
| 0 | 95 | < 0.1% | |
| 1 | 391508 | 3.9% | |
| 2 | 1202534 | 12.0% | |
| 3 | 1170412 | 11.7% | |
| 4 | 1567 | < 0.1% |
Maximum 5 values
| Value | Count | Frequency (%) | |
| 675 | 3 | < 0.1% | |
| 651 | 3 | < 0.1% | |
| 645 | 1 | < 0.1% | |
| 629 | 1 | < 0.1% | |
| 625 | 1 | < 0.1% |
attributed_time
Categorical
| Distinct count | 15699 |
|---|---|
| Unique (%) | 0.2% |
| Missing (%) | 99.8% |
| Missing (n) | 9981283 |
| 2017-11-06 23:36:23 | 6 |
|---|---|
| 2017-11-06 23:53:12 | 5 |
| 2017-11-06 23:28:43 | 5 |
| Other values (15695) | 18701 |
| (Missing) |
| Value | Count | Frequency (%) | |
| 2017-11-06 23:36:23 | 6 | < 0.1% | |
| 2017-11-06 23:53:12 | 5 | < 0.1% | |
| 2017-11-06 23:28:43 | 5 | < 0.1% | |
| 2017-11-07 00:12:00 | 5 | < 0.1% | |
| 2017-11-06 23:55:44 | 5 | < 0.1% | |
| 2017-11-07 00:03:48 | 5 | < 0.1% | |
| 2017-11-07 00:02:07 | 5 | < 0.1% | |
| 2017-11-06 16:14:02 | 4 | < 0.1% | |
| 2017-11-07 00:04:11 | 4 | < 0.1% | |
| 2017-11-06 23:37:42 | 4 | < 0.1% | |
| Other values (15688) | 18669 | 0.2% | |
| (Missing) | 9981283 | 99.8% |
| Max length | 19 |
|---|---|
| Mean length | 3.0299472 |
| Min length | 3 |
| Contains chars | True |
| Contains digits | True |
| Contains spaces | True |
| Contains non-words | True |
channel
Numeric
| Distinct count | 170 |
|---|---|
| Unique (%) | < 0.1% |
| Missing (%) | 0.0% |
| Missing (n) | 0 |
| Infinite (%) | 0.0% |
| Infinite (n) | 0 |
| Mean | 252.6604048 |
|---|---|
| Minimum | 0 |
| Maximum | 498 |
| Zeros (%) | < 0.1% |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 107 |
| Q1 | 134 |
| Median | 237 |
| Q3 | 377 |
| 95-th percentile | 477 |
| Maximum | 498 |
| Range | 498 |
| Interquartile range | 243 |
Descriptive statistics
| Standard deviation | 130.0375702 |
|---|---|
| Coef of variation | 0.5146733233 |
| Kurtosis | -1.037421756 |
| Mean | 252.6604048 |
| MAD | 109.1975162 |
| Skewness | 0.4725883333 |
| Sum | 2526604048 |
| Variance | 16909.76966 |
| Memory size | 76.3 MiB |
Histogram with fixed size bins (bins=50)
Histogram with variable size bins (bins=[ 0. 1.5 3.5 9. 14. ... 488.5 492.5 496.5 497.5 498. ], "bayesian blocks" binning strategy used)
| Value | Count | Frequency (%) | |
| 245 | 793105 | 7.9% | |
| 134 | 630888 | 6.3% | |
| 259 | 469845 | 4.7% | |
| 477 | 412559 | 4.1% | |
| 121 | 402226 | 4.0% | |
| 107 | 388035 | 3.9% | |
| 145 | 348862 | 3.5% | |
| 153 | 296832 | 3.0% | |
| 205 | 279720 | 2.8% | |
| 178 | 269720 | 2.7% | |
| Other values (160) | 5708208 | 57.1% |
Minimum 5 values
| Value | Count | Frequency (%) | |
| 0 | 45 | < 0.1% | |
| 3 | 77703 | 0.8% | |
| 4 | 7 | < 0.1% | |
| 5 | 6 | < 0.1% | |
| 13 | 1357 | < 0.1% |
Maximum 5 values
| Value | Count | Frequency (%) | |
| 498 | 15 | < 0.1% | |
| 497 | 24556 | 0.2% | |
| 496 | 553 | < 0.1% | |
| 489 | 119261 | 1.2% | |
| 488 | 1208 | < 0.1% |
click_time
Categorical
| Distinct count | 29943 |
|---|---|
| Unique (%) | 0.3% |
| Missing (%) | 0.0% |
| Missing (n) | 0 |
| 2017-11-06 16:05:10 | 1261 |
|---|---|
| 2017-11-06 16:05:12 | 1220 |
| 2017-11-06 16:05:11 | 1206 |
| Other values (29940) |
| Value | Count | Frequency (%) | |
| 2017-11-06 16:05:10 | 1261 | < 0.1% | |
| 2017-11-06 16:05:12 | 1220 | < 0.1% | |
| 2017-11-06 16:05:11 | 1206 | < 0.1% | |
| 2017-11-06 16:05:09 | 1198 | < 0.1% | |
| 2017-11-06 16:05:15 | 1197 | < 0.1% | |
| 2017-11-06 16:05:14 | 1194 | < 0.1% | |
| 2017-11-06 16:00:45 | 1187 | < 0.1% | |
| 2017-11-06 16:05:24 | 1176 | < 0.1% | |
| 2017-11-06 16:01:06 | 1174 | < 0.1% | |
| 2017-11-06 16:00:43 | 1173 | < 0.1% | |
| Other values (29933) | 9988014 | 99.9% |
| Max length | 19 |
|---|---|
| Mean length | 19 |
| Min length | 19 |
| Contains chars | False |
| Contains digits | True |
| Contains spaces | True |
| Contains non-words | True |
device
Numeric
| Distinct count | 940 |
|---|---|
| Unique (%) | < 0.1% |
| Missing (%) | 0.0% |
| Missing (n) | 0 |
| Infinite (%) | 0.0% |
| Infinite (n) | 0 |
| Mean | 33.0387117 |
|---|---|
| Minimum | 0 |
| Maximum | 3545 |
| Zeros (%) | 0.5% |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 1 |
| Median | 1 |
| Q3 | 1 |
| 95-th percentile | 2 |
| Maximum | 3545 |
| Range | 3545 |
| Interquartile range | 0 |
Descriptive statistics
| Standard deviation | 308.8297662 |
|---|---|
| Coef of variation | 9.347512366 |
| Kurtosis | 90.16965418 |
| Mean | 33.0387117 |
| MAD | 63.26048567 |
| Skewness | 9.596435052 |
| Sum | 330387117 |
| Variance | 95375.82448 |
| Memory size | 76.3 MiB |
Histogram with fixed size bins (bins=50)
Histogram with variable size bins (bins=[0.0000e+00 5.0000e-01 1.5000e+00 3.0000e+00 5.0000e+00 ... 3.0325e+03 3.0345e+03 3.1590e+03 3.1660e+03 3.5450e+03], "bayesian blocks" binning strategy used)
| Value | Count | Frequency (%) | |
| 1 | 9381146 | 93.8% | |
| 2 | 456617 | 4.6% | |
| 3032 | 104393 | 1.0% | |
| 0 | 46476 | 0.5% | |
| 59 | 1618 | < 0.1% | |
| 40 | 462 | < 0.1% | |
| 6 | 458 | < 0.1% | |
| 16 | 334 | < 0.1% | |
| 18 | 247 | < 0.1% | |
| 33 | 204 | < 0.1% | |
| Other values (930) | 8045 | 0.1% |
Minimum 5 values
| Value | Count | Frequency (%) | |
| 0 | 46476 | 0.5% | |
| 1 | 9381146 | 93.8% | |
| 2 | 456617 | 4.6% | |
| 4 | 60 | < 0.1% | |
| 6 | 458 | < 0.1% |
Maximum 5 values
| Value | Count | Frequency (%) | |
| 3545 | 1 | < 0.1% | |
| 3537 | 1 | < 0.1% | |
| 3527 | 1 | < 0.1% | |
| 3525 | 1 | < 0.1% | |
| 3524 | 1 | < 0.1% |
ip
Numeric
| Distinct count | 68740 |
|---|---|
| Unique (%) | 0.7% |
| Missing (%) | 0.0% |
| Missing (n) | 0 |
| Infinite (%) | 0.0% |
| Infinite (n) | 0 |
| Mean | 87331.72281 |
|---|---|
| Minimum | 9 |
| Maximum | 212774 |
| Zeros (%) | 0.0% |
Quantile statistics
| Minimum | 9 |
|---|---|
| 5-th percentile | 6976 |
| Q1 | 42164 |
| Median | 81973 |
| Q3 | 121187 |
| 95-th percentile | 193521 |
| Maximum | 212774 |
| Range | 212765 |
| Interquartile range | 79023 |
Descriptive statistics
| Standard deviation | 55675.27388 |
|---|---|
| Coef of variation | 0.6375148925 |
| Kurtosis | -0.681793417 |
| Mean | 87331.72281 |
| MAD | 46048.22713 |
| Skewness | 0.4248442712 |
| Sum | 8.733172281e+11 |
| Variance | 3099736122 |
| Memory size | 76.3 MiB |
Histogram with fixed size bins (bins=50)